A hungarian child database for speech processing applications
نویسندگان
چکیده
This paper introduces a new Hungarian database containing spoken material recorded from children. The aspects, which were taken into consideration under the selection of the speakers (and the composition of the speech database corpus), and the final content of the corpus is discussed. We described the method of the recording and the post-processing work on the recorded material. The paper also touches on the possible applications, in which the database is usable. Different difficulties faced during the work, mainly arising from the age of our speakers, are reported. INTRODUCTION The processing of children’s utterance is getting more and more important in the research and development of speech technology, mainly in speech therapy and in speech recognition [1],[2],[3]. Since a good speech database is needed in many speech processing application, collecting of a children’s speech corpus is a very important task to do. Our database reported in this paper is primary made for SPECO project of the INCO Copernicus programme of European Commission, titled „A multimedia multilingual teaching and training system for handicapped children” [6], but we tried to make a widely usable speech corpus. COMPOSITION OF THE TEXT MATERIAL Our database is mainly collected to make the distance score evaluation of speech parameters between the Hungarian fricatives, affricates and vowels of rightspeaking and speech handicapped children. The text material minimally had to contain all of these Hungarian phonemes in isolated form, in sound connections, in words and in sentences. In sound connections, all vowels occur in concatenation with bilabial, alveolar and velar bursts, to present the coarticulation effects. In words, all examined speech sounds occurred in all sound positions and in all typical sound connections. One, two and three syllabic words were included. The sentences are designed to present the typical Hungarian intonation forms. You can see the structure of the text in Table 1. In favour of the wide-ranging usability of the database (e.g. speech recognition), our aim was to provide as much as possible a phonetically rich material, including the most frequent Hungarian phonemes and sound connections. In an earlier detailed statistical examination [4], it was found, that half-syllable units give the most compact description of the phonological structure of Hungarian language and it is the reason why we tried to compose a half syllable rich material. We analized the frequency of the occurred half syllables in the whole material. The result is to be seen in Table 2. It is a very difficult task to construct a good children’s speech database. Two aspects had to be considered. On the first hand the text material had to be large to represent a language as much as possible. On the second hand we had to be thoughtful of the age of our speakers. We can’t use as long material as we want to, especially at the collection of children’s speech. For example we had to take into consideration, that the spoken utterances mustn’t be longer than 10-15 minutes, especially in case of 5 years old speakers. It is also a very important aspect, that the active vocabulary of 5 years old children is much smaller than the vocabulary of adults. SPEAKER SELECTION In our research we focused on 5-10 years old children. The selection according the age is to be seen in Table 3. As our speech database is primarily made for a teaching and training system for speech handicapped children, so it was important to study the voice of not only children with good, and average pronunciation, but also speech handicapped children. Therefore, we included in the database children with speech defects (approximately 40%), but those are only some examples. 6th European Conference on Speech Communication and Technology (EUROSPEECH’99) Budapest, Hungary, September 5-9, 1999 ISCA Archive http://www.isca-speech.org/archive Table 1. The structure of the text Spoken utterance types Examples Sustained voices vowels. O, A:, E, e:, i, o, 2, u, y fricatives :, v s, S, z, Z, f, v Voice-connections vowels with bilabial (p), alveolar (t) and velar (k) burst pi: ti: ki, py: ty: ky, etc. Digits 0-10 nullO (0), Ed’ (1), kEtt2: (2), etc. 76 Words monosyllabic dissyllabic trisyllabic z2ld (green) t_sit_sO (cat) mu:zEum (museum) falling O fA:n mo:kuS volt (There was a squirrel on the tree) 29 quick falling mEjik? (Which?) Sentences rising-falling bOlA:Z hol vOn? (Where is Balázs?) with 5 different intonation forms quick falling-falling kOti zEne:l? (Is Kate playing music?) floating nEm, mA:r hOzOmEnt. (No, she already went home.) Table 2. Occurrence of the 40 most frequent Hungarian a.) beginning b.) ending half syllables a.) b.) Type of the half syllable frequency in the 20 Hungarian prose frequency in the text of the Hungarian child database
منابع مشابه
A Comparative Study of Gender and Age Classification in Speech Signals
Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملPhonetic Level Annotation and Segmentation of Hungarian Speech Databases
The purpose of this paper is to give an outline of phonetic level annotation and segmentation of Hungarian speech databases at the levels of definition and speech technology. In addition to giving guidance to the definition of the content of a database, the technique of annotation and the procedure of manual segmentation, we also discuss mathematical models of computeraided semi-automatic and a...
متن کاملClassification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کاملDesigning and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کامل